19 research outputs found

    An Unsupervised Knowledge Free Algorithm for the Learning of Morphology in Natural Languages - Master\u27s Thesis, May 2002

    Get PDF
    This thesis describes an unsupervised system to learn natural language morphology, specifically suffix identification from unannotated text. The system is language independent, so that is can learn the morphology of any human language. For English this means identifying “-s”, “-ing”, “-ed”, “-tion” and many other suffixes, in addition to learning which stems they attach to. The system uses no prior knowledge, such as part of speech tags, and learns the morphology by simply reading in a body of unannotated text. The system consists of a generative probabilistic model which is used to evaluate hypotheses, and a directed search and a hill-climbing search which are used in conjunction to find a highly probably hypothesis. Experiments applying the system to English and Polish are described

    Expected Dependency Pair Match: Predicting translation quality with expected syntactic structure

    Get PDF
    Abstract. Recent efforts aimed at improving over standard machine translation evaluation methods (BLEU, TER) have investigated mechanisms for accounting for allowable wording differences either in terms of syntactic structure or synonyms/paraphrases. This paper explores an approach for combining scores from partial syntactic dependency matches with standard local n-gram matches using a statistical parser, and taking advantage of parse probabilities in deriving expected scores based on the N-best parses for the hypothesized sentence translation. The new scoring metric, Expected Dependency Pair Match (EDPM), is shown to be superior to BLEU and TER in terms of correlation to human judgements and as a perdocument and per-sentence predictor of HTER, using mean subtraction to account for document difficulty. Further, we explore the potential benefit of combining the n-gram and syntactic features of EDPM with the alternative wording features of TERp, with experiments showing that there is a benefit to accounting for syntactic structure on top of the semantic equivalency features

    Serrated Lesions of the Colorectum: Review and Recommendations From an Expert Panel

    Get PDF
    Serrated lesions of the colorectum are the precursors of perhaps one-third of colorectal cancers. Cancers arising in serrated lesions are usually in the proximal colon, and account for a disproportionate fraction of cancer identified after colonoscopy

    Abstract

    No full text
    This paper describes a system for the unsupervised learning of morphological suffixes and stems from word lists. The system is composed of a generative probability model and hill-climbing and directed search algorithms. By extracting and examining morphologically rich subsets of an input lexicon, the directed search identifies highly productive paradigms. The hill-climbing algorithm then further maximizes the probability of the hypothesis. Quantitative results are shown by measuring the accuracy of the morphological relations identified. Experiments in English and Polish, as well as comparisons with another recent unsupervised morphology learning algorithm demonstrate the effectiveness of this technique.

    Unsupervised Learning of Morphology Using a Novel Directed Search

    No full text
    This paper describes a system for the unsupervised learning of morphological suffixes and stems from word lists. The system is composed of a generative probability model and a novel search algorithm
    corecore